401 research outputs found

    Integrated region- and pixel-based approach to background modelling

    Get PDF
    In this paper a new probabilistic method for background modelling is proposed, aimed at the application in video surveillance tasks using a monitoring static camera. Recently, methods employing Time-Adaptive, Per Pixel, Mixture of Gaussians (TAPPMOG) modelling have become popular due to their intrinsic appealing properties. Nevertheless, they are not able per se to monitor global changes in the scene, because they model the background as a set of independent pixel processes. In this paper, we propose to integrate this kind of pixel-based information with higher level region-based information, that permits to manage also sudden changes of the background. These pixel- and regionbased modules are naturally and effectively embedded in a probabilistic Bayesian framework called particle filtering, that allows a multi-object tracking. Experimental comparison with a classic pixel-based approach reveals that the proposed method is really effective in recovering from situations of sudden global illumination changes of the background, as well as limited non-uniform changes of the scene illumination.

    ACOUSTIC RANGE IMAGE SEGMENTATION BY EFFECTIVE MEAN SHIFT

    Get PDF
    Image perception in underwater environment is a difficult task for a human operator, and data segmentation becomes a crucial step toward an higher level interpretation and recognition of the observing scenarios. This paper contributes to the related state of the art, by fitting the mean shift clustering paradigm to the segmentation of acoustical range images, providing a segmentation approach in which whatever parameter tuning is absent. Moreover, the method exploits actively the connectivity information provided by the range map, by using reverse projection as acceleration technique. Therefore, the method is able to produce, starting from raw range data, meaningful segmented clouds of points in a fully automatic and efficient fashion. 1

    Scalable and Compact 3D Action Recognition with Approximated RBF Kernel Machines

    Get PDF
    Despite the recent deep learning (DL) revolution, kernel machines still remain powerful methods for action recognition. DL has brought the use of large datasets and this is typically a problem for kernel approaches, which are not scaling up eciently due to kernel Gram matrices. Nevertheless, kernel methods are still attractive and more generally applicable since they can equally manage dierent sizes of the datasets, also in cases where DL techniques show some limitations. This work investigates these issues by proposing an explicit ap- proximated representation that, together with a linear model, is an equivalent, yet scalable, implementation of a kernel machine. Our approximation is directly inspired by the exact feature map that is induced by an RBF Gaussian kernel but, unlike the latter, it is nite dimensional and very compact. We justify the soundness of our idea with a theoretical analysis which proves the unbiasedness of the approximation, and provides a vanishing bound for its variance, which is shown to decrease much rapidly than in alternative methods in the literature. In a broad experimental validation, we assess the superiority of our approximation in terms of 1) ease and speed of training, 2) compactness of the model, and 3) improvements with respect to the state-of-the-art performance

    Enhancing visual embeddings through weakly supervised captioning for zero-shot learning

    Get PDF
    Visual features designed for image classification have shown to be useful in zero-shot learning (ZSL) when generalizing towards classes not seen during training. In this paper, we argue that a more effective way of building visual features for ZSL is to extract them through captioning, in order not just to classify an image but, instead, to describe it. However, modern captioning models rely on a massive level of supervision, e.g up to 15 extended descriptions per instance provided by humans, which is simply not available for ZSL benchmarks. In the latter in fact, the available annotations inform about the presence/absence of attributes within a fixed list only. Worse, attributes are seldom annotated at the image level, but rather, at the class level only: because of this, the annotation cannot be visually grounded. In this paper, we deal with such a weakly supervised regime to train an end-to-end LSTM captioner, whose backbone CNN image encoder can provide better features for ZSL. Our enhancement of visual features, called 'VisEn', is compatible with any generic ZSL method, without requiring changes in its pipeline (a part from adapting hyper-parameters). Experimentally, VisEn is capable of sharply improving recognition performance on unseen classes, as we demonstrate thorough an ablation study which encompasses different ZSL approaches. Further, on the challenging fine-grained CUB dataset, VisEn improves by margin state-of-the-art methods, by using visual descriptors of one order of magnitude smaller

    A Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning

    Get PDF
    This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) framework for the problem of learning an unknown functional dependency between a structured input space and a structured output space. Our formulation encompasses both Vector-valued Manifold Regularization and Co-regularized Multi-view Learning, providing in particular a unifying framework linking these two important learning approaches. In the case of the least square loss function, we provide a closed form solution, which is obtained by solving a system of linear equations. In the case of Support Vector Machine (SVM) classi fi cation, our formulation generalizes in particular both the binary Laplacian SVM to the multi-class, multi-view settings and the multi-class Simplex Cone SVM to the semisupervised, multi-view settings. The solution is obtained by solving a single quadratic optimization problem, as in standard SVM, via the Sequential Minimal Optimization (SMO) approach. Empirical results obtained on the task of object recognition, using several challenging data sets, demonstrate the competitiveness of our algorithms compared with other state-of-the-art methods

    Stel component analysis: Modeling spatial correlations in image class structure

    Get PDF

    Adaptation of Person Re-identification Models for On-boarding New Camera(s)

    Get PDF
    Existing approaches for person re-identification have concentrated on either designing the best feature representation or learning optimal matching metrics in a static setting where the number of cameras are fixed in a network. Most approaches have neglected the dynamic and open world nature of the re- identification problem, where one or multiple new cameras may be temporarily on-boarded into an ex- isting system to get additional information or added to expand an existing network. To address such a very practical problem, we propose a novel approach for adapting existing multi-camera re-identification frameworks with limited supervision. First, we formulate a domain perceptive re-identification method based on geodesic flow kernel that can effectively find the best source camera (already installed) to adapt with newly introduced target camera(s), without requiring a very expensive training phase. Second, we introduce a transitive inference algorithm for re-identification that can exploit the information from best source camera to improve the accuracy across other camera pairs in a network of multiple cameras. Third, we develop a target-aware sparse prototype selection strategy for finding an informative subset of source camera data for data-efficient learning in resource constrained environments. Our approach can greatly increase the flexibility and reduce the deployment cost of new cameras in many real-world dy- namic camera networks. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art unsupervised alternatives whilst being extremely efficient to compute

    Intra-Camera Supervised Person Re-Identification: A New Benchmark

    Get PDF
    Existing person re-identification (re-id) methods rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process therefore leading to poor scalability in practical re-id applications. To overcome this fundamental limitation, we consider person re-identification without inter-camera identity association but only with identity labels independently annotated within each individual camera-view. This eliminates the most time-consuming and tedious inter-camera identity labelling process in order to significantly reduce the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which we call Intra-Camera Supervised (ICS) person re-id. Under this ICS setting with weaker label supervision, we formulate a Multi-Task Multi-Label (MTML) deep learning method. Given no inter-camera association, MTML is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MTML can also efficiently learn the discriminative re-id feature representations by fully using the available identity labels within each camera-view. Extensive experiments demonstrate the performance superiority of our MTML model over the state-of-the-art alternative methods on three large-scale person re-id datasets in the proposed intra-camera supervised learning setting.Comment: 9 pages, 3 figures, accepted by ICCV Workshop on Real-World Recognition from Low-Quality Images and Videos, 201
    corecore